AITopics | action dimension

Reinforcement Learning (RL) has achieved remarkable success in various continuous control tasks, such as robot manipulation and locomotion. Different to mainstream RL which makes decisions at individual steps, recent studies have incorporated action repetition into RL, achieving enhanced action persistence with improved sample efficiency and superior performance. However, existing methods treat all action dimensions as a whole during repetition, ignoring variations among them. This constraint leads to inflexibility in decisions, which reduces policy agility with inferior effectiveness. In this work, we propose a novel repetition framework called SDAR, which implements Spatially Decoupled Action Repetition through performing closed-loop act-or-repeat selection for each action dimension individually. SDAR achieves more flexible repetition strategies, leading to an improved balance between action persistence and diversity. Compared to existing repetition frameworks, SDAR is more sample efficient with higher policy performance and reduced action fluctuation. Experiments are conducted on various continuous control scenarios, demonstrating the effectiveness of spatially decoupled repetition design proposed in this work.

dimension, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2502.06919

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > Canada > Alberta > Census Division No. 15 > Municipal District of Bighorn No. 8 > Kananaskis (0.04)
Europe > Portugal > Braga > Braga (0.04)

Genre:

Research Report (1.00)
Workflow (0.88)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

Add feedback

Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network

Liu, Jijia, Gao, Feng, Liao, Qingmin, Yu, Chao, Wang, Yu

arXiv.org Artificial IntelligenceJan-31-2025

Reinforcement learning (RL) for continuous control often requires large amounts of online interaction data. Value-based RL methods can mitigate this burden by offering relatively high sample efficiency. Some studies further enhance sample efficiency by incorporating offline demonstration data to "kick-start" training, achieving promising results in continuous control. However, they typically compute the Q-function independently for each action dimension, neglecting interdependencies and making it harder to identify optimal actions when learning from suboptimal data, such as non-expert demonstration and online-collected data during the training process. To address these issues, we propose Auto-Regressive Soft Q-learning (ARSQ), a value-based RL algorithm that models Q-values in a coarse-to-fine, auto-regressive manner. First, ARSQ decomposes the continuous action space into discrete spaces in a coarse-to-fine hierarchy, enhancing sample efficiency for fine-grained continuous control tasks. Next, it auto-regressively predicts dimensional action advantages within each decision step, enabling more effective decision-making in continuous control tasks. We evaluate ARSQ on two continuous control benchmarks, RLBench and D4RL, integrating demonstration data into online training. On D4RL, which includes non-expert demonstrations, ARSQ achieves an average $1.62\times$ performance improvement over SOTA value-based baseline. On RLBench, which incorporates expert demonstrations, ARSQ surpasses various baselines, demonstrating its effectiveness in learning from suboptimal online-collected data.

dimension, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2502.00288

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Advantage-based Optimization Method for Reinforcement Learning in Large Action Space

Lin, Hai, Huang, Cheng, Chen, Zhihong

arXiv.org Artificial IntelligenceDec-17-2024

Reinforcement learning tasks in real-world scenarios often involve large, high-dimensional action spaces, leading to challenges such as convergence difficulties, instability, and high computational complexity. It is widely acknowledged that traditional value-based reinforcement learning algorithms struggle to address these issues effectively. A prevalent approach involves generating independent sub-actions within each dimension of the action space. However, this method introduces bias, hindering the learning of optimal policies. In this paper, we propose an advantage-based optimization method and an algorithm named Advantage Branching Dueling Q-network (ABQ). ABQ incorporates a baseline mechanism to tune the action value of each dimension, leveraging the advantage relationship across different sub-actions. With this approach, the learned policy can be optimized for each dimension. Empirical results demonstrate that ABQ outperforms BDQ, achieving 3%, 171%, and 84% more cumulative rewards in HalfCheetah, Ant, and Humanoid environments, respectively. Furthermore, ABQ exhibits competitive performance when compared against two continuous action benchmark algorithms, DDPG and TD3.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2412.12605

Country:

Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre:

Research Report (0.84)
Overview (0.68)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

MotionGlot: A Multi-Embodied Motion Generation Model

Harithas, Sudarshan, Sridhar, Srinath

arXiv.org Artificial IntelligenceOct-21-2024

This paper introduces MotionGlot, a model that can generate motion across multiple embodiments with different action dimensions, such as quadruped robots and human bodies. By leveraging the well-established training procedures commonly used in large language models (LLMs), we introduce an instruction-tuning template specifically designed for motion-related tasks. Our approach demonstrates that the principles underlying LLM training can be successfully adapted to learn a wide range of motion generation tasks across multiple embodiments with different action dimensions. We demonstrate the various abilities of MotionGlot on a set of 6 tasks and report an average improvement of 35.3% across tasks. Additionally, we contribute two new datasets: (1) a dataset of expert-controlled quadruped locomotion with approximately 48,000 trajectories paired with direction-based text annotations, and (2) a dataset of over 23,000 situational text prompts for human motion generation tasks. Finally, we conduct hardware experiments to validate the capabilities of our system in real-world applications.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2410.16623

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MoDex: Planning High-Dimensional Dexterous Control via Learning Neural Hand Models

Wu, Tong, Li, Shoujie, Lyu, Chuqiao, Sou, Kit-Wa, Chan, Wang-Sing, Ding, Wenbo

arXiv.org Artificial IntelligenceSep-17-2024

Controlling hands in the high-dimensional action space has been a longstanding challenge, yet humans naturally perform dexterous tasks with ease. In this paper, we draw inspiration from the human embodied cognition and reconsider dexterous hands as learnable systems. Specifically, we introduce MoDex, a framework which employs a neural hand model to capture the dynamical characteristics of hand movements. Based on the model, a bidirectional planning method is developed, which demonstrates efficiency in both training and inference. The method is further integrated with a large language model to generate various gestures such as ``Scissorshand" and ``Rock\&Roll." Moreover, we show that decomposing the system dynamics into a pretrained hand model and an external model improves data efficiency, as supported by both theoretical analysis and empirical experiments. Additional visualization results are available at https://tongwu19.github.io/MoDex.

hand model, hand state, manipulation, (16 more...)

arXiv.org Artificial Intelligence

2409.10983

Country:

Asia > China > Guangdong Province > Shenzhen (0.06)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.47)

Add feedback

Causality-Driven Reinforcement Learning for Joint Communication and Sensing

Roy, Anik, Banerjee, Serene, Sadasivan, Jishnu, Sarkar, Arnab, Dey, Soumyajit

arXiv.org Artificial IntelligenceSep-7-2024

The next-generation wireless network, 6G and beyond, envisions to integrate communication and sensing to overcome interference, improve spectrum efficiency, and reduce hardware and power consumption. Massive Multiple-Input Multiple Output (mMIMO)-based Joint Communication and Sensing (JCAS) systems realize this integration for 6G applications such as autonomous driving, as it requires accurate environmental sensing and time-critical communication with neighboring vehicles. Reinforcement Learning (RL) is used for mMIMO antenna beamforming in the existing literature. However, the huge search space for actions associated with antenna beamforming causes the learning process for the RL agent to be inefficient due to high beam training overhead. The learning process does not consider the causal relationship between action space and the reward, and gives all actions equal importance. In this work, we explore a causally-aware RL agent which can intervene and discover causal relationships for mMIMO-based JCAS environments, during the training phase. We use a state dependent action dimension selection strategy to realize causal discovery for RL-based JCAS. Evaluation of the causally-aware RL framework in different JCAS scenarios shows the benefit of our proposed framework over baseline methods in terms of the beamforming gain.

base station, reinforcement learning, vector, (15 more...)

arXiv.org Artificial Intelligence

2409.15329

Country: